library(tidyverse)
library(here)

Is there a lag in biological response to oceanographic condition?

Blue Rockfish

Biological Response: I will use Blue Rockfish catch per unit effort (CPUE) to start because they are the most abundant species in the south-central coast of California.

Oceanographic Index: I am going to use the Multivariate Ocean Climate Index (MOCI) to start since it’s component parts include the pacific decadal oscillation, multivariate ENSO index, sea surface temperature, upwelling index, etc. The data can be found here: http://www.faralloninstitute.org/moci

Sub-question: Is there a difference in lag time between juvenile and adult blue rockfish?

Below is the growth curve of Blue rockfish for reference. When all fish are mature, they put less energy into growth and start to put their energy towards reproduction and the line flattens out. They are maturing somewhere in the steep slope of this curve, and I want to choose a size roughly halfway on that part of the curve. Additionally, a recent masters thesis on Blue rockfish found that 50% length at maturity is about 23 cm (Schmidt 2014). So I am using that as my size cut-off for juveniles and adults with 23 cm fish included in the juvenile data.

Terms Reference:

MOCI <- read_csv(here("Data", "Central_California_MOCI.csv"))

full_cp_ml <- read_csv(here("Data/output", "2020-11-18_master_II.csv"))

## filtering for Blue rockfish only
blue_full <- full_cp_ml %>%
  filter(species == "BLU")

Nested Design

drift > month > cell > site > area > year

Data cleaning to account for potential uneven sampling within nested design

Namely, we sample cells randomly with replacement. So we can potentially sample a cell multiple times in a season (or not sample it at all).

size_cutoff <- 23

## this is cpue at the drift level
juv <- blue_full %>%
  filter(size <= size_cutoff)%>%
  group_by(drift, trip, area, site, month, day, year, gridcell) %>%
  summarise(cpue_sum = sum(cpue))

## mean cpue at the date (month) level 
juv_2 <- juv %>%
  group_by(trip, area, site, month, year, gridcell) %>%
  summarise(cpue_date = mean(cpue_sum))

## mean cpue at the cell level
juv_3 <- juv_2 %>%
  group_by( area, site, year, gridcell) %>%
  summarise(cpue_cell = mean(cpue_date))

## mean cpue a the site (protection status level)
juv_4 <- juv_3 %>%
  group_by( area, site, year) %>%
  summarise(cpue_site = mean(cpue_cell))

## all in one step for adults
adult <- blue_full %>%
  filter(size > size_cutoff)%>%
  group_by(drift, trip, area, site, month, day, year, gridcell) %>%
  summarise(cpue_sum = sum(cpue))%>%
  group_by(trip, area, site, month, year, gridcell) %>%
  summarise(cpue_date = mean(cpue_sum)) %>%
  group_by( area, site, year, gridcell) %>%
  summarise(cpue_cell = mean(cpue_date)) %>%
  group_by( area, site, year) %>%
  summarise(cpue_site = mean(cpue_cell))

## not separated by size
full <- blue_full %>%
  group_by(drift, trip, area, site, month, day, year, gridcell) %>%
  summarise(cpue_sum = sum(cpue))%>%
  group_by(trip, area, site, month, year, gridcell) %>%
  summarise(cpue_date = mean(cpue_sum)) %>%
  group_by( area, site, year, gridcell) %>%
  summarise(cpue_cell = mean(cpue_date)) %>%
  group_by( area, site, year) %>%
  summarise(cpue_site = mean(cpue_cell))


juv_moci <- left_join(juv_4, MOCI, by = "year") 

adult_moci <- left_join(adult, MOCI, by = "year")

full_moci <- left_join(full, MOCI, by = "year")

Each of these datasets juv_moci, adult_moci, and full_moci, will now have the CPUE value and MOCI index value for each area + site + year + season combination.

juv_moci %>% head()
## # A tibble: 6 x 6
## # Groups:   area, site [1]
##   area  site   year cpue_site season central_ca
##   <chr> <chr> <dbl>     <dbl> <chr>       <dbl>
## 1 AN    MPA    2007     0.603 JFM        -6.27 
## 2 AN    MPA    2007     0.603 AMJ        -5.00 
## 3 AN    MPA    2007     0.603 JAS        -0.925
## 4 AN    MPA    2007     0.603 OND        -6.01 
## 5 AN    MPA    2008     1.06  JFM        -6.18 
## 6 AN    MPA    2008     1.06  AMJ        -6.89

Cross correlation function loops

Full data with protection status included

headers <- data.frame(area = character(), site = character(), 
                      season = character(), lag = character(), corr_values = character())

juv_corr <- headers
adult_corr <- headers
full_corr <- headers

area_list <- c("PB", "BL", "AN", "PL")
site_list <- c("MPA", "REF")
season_list <- c("JFM", "AMJ", "JAS", "OND")
for(j in area_list){
  for(k in site_list){
    for(l in season_list){
      full_combos <- full_moci%>%
      filter(area == j, site == k, season == l)
      cross_corr <- ccf(full_combos$central_ca,
                        full_combos$cpue_site, plot = F)
      lag <- cross_corr$lag
      corr_values <- cross_corr$acf
      for(m in 1:length(lag)){
      full_corr <- bind_rows(full_corr, c(area = j, site = k, season = l, 
                                     lag = lag[m], corr_values = corr_values[m]))
      }

    }
  }
}

I only need the negative lag values and seasons are plotted in alphabetical order unless I set them as factors with each name being a level.

vis_full <- full_corr %>%
  filter(lag %in% c(-10:0))%>%
  mutate(corr_values = as.numeric(corr_values),
         lag = as.numeric(lag))

vis_full$season <- factor(vis_full$season, levels = c("JFM", "AMJ", "JAS", "OND"))

full_moci_plot <- ggplot() +
  scale_color_manual(values = c(  "#999999" , "#009999", "#666666", "#000000"))+
  scale_fill_manual(values = c( "#999999", "#009999", "#666666", "#000000"))+
  geom_col(position = "dodge2", data = vis_full, aes(lag, y = corr_values,  color = season,
                                               fill = season))+
  labs(title = "Blue Rockfish Lag Correlation -MOCI", y = "correlation",
       caption = "Fig 1")+
  scale_x_continuous(limits=c(-8.5,0.5),breaks=seq(-8,0,1))+
  scale_y_continuous(limits = c(-0.6, 0.9), breaks = seq(-0.6, 0.9, 0.2), 
                     labels = c(-0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8) )+
  theme_bw()+
  facet_grid(site ~ area)+
    theme(panel.grid=element_blank(),
          plot.caption = element_text(size = 8))
full_moci_plot

What I see here is that MPAs generally have higher correlation between cpue and whatever lag value it is evaluating, but that MPA and REF follow the same pattern within the same area. The between area differences are greater than between protection status.

Juvenile vs Adult with protection status included

## juv loop
for(j in area_list){
  for(k in site_list){
    for(l in season_list){
      juv_combos <- juv_moci%>%
      filter(area == j, site == k, season == l)
      cross_corr <- ccf(juv_combos$central_ca,
                        juv_combos$cpue_site, plot = F)
      lag <- cross_corr$lag
      corr_values <- cross_corr$acf
      for(m in 1:length(lag)){
      juv_corr <- bind_rows(juv_corr, c(area = j, site = k, season = l, 
                                     lag = lag[m], corr_values = corr_values[m]))
      }

    }
  }
}

## adult loop
for(j in area_list){
  for(k in site_list){
    for(l in season_list){
      adult_combos <- adult_moci%>%
      filter(area == j, site == k, season == l)
      cross_corr <- ccf(adult_combos$central_ca,
                        adult_combos$cpue_site, plot = F)
      lag <- cross_corr$lag
      corr_values <- cross_corr$acf
      for(m in 1:length(lag)){
      adult_corr <- bind_rows(adult_corr, c(area = j, site = k, season = l, 
                                     lag = lag[m], corr_values = corr_values[m]))
      }

    }
  }
}
## juv plot set up
vis_juv <- juv_corr %>%
  filter(lag %in% c(-10:0))%>%
  mutate(corr_values = as.numeric(corr_values),
         lag = as.numeric(lag))

vis_juv$season <- factor(vis_juv$season, levels = c("JFM", "AMJ", "JAS", "OND"))

juv_moci_plot <- ggplot() +
  scale_color_manual(values = c(  "#999999" , "#009999", "#666666", "#000000"))+
  scale_fill_manual(values = c( "#999999", "#009999", "#666666", "#000000"))+
  geom_col(position = "dodge2", data = vis_juv, aes(lag, y = corr_values,  color = season,
                                               fill = season))+
  labs(title = "Juvenile Blue Rockfish Lag Correlation -MOCI", y = "correlation",
       caption = "Fig 2")+
  scale_x_continuous(limits=c(-8.5,0.5),breaks=seq(-8,0,1))+
  scale_y_continuous(limits = c(-0.6, 0.95), breaks = seq(-0.6, 0.9, 0.2), 
                     labels = c(-0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8) )+
  theme_bw()+
  facet_grid(site ~ area)+
    theme(panel.grid=element_blank(),
          plot.caption = element_text(size = 8))

## adult plot set up
vis_adult <- adult_corr %>%
  filter(lag %in% c(-10:0))%>%
  mutate(corr_values = as.numeric(corr_values),
         lag = as.numeric(lag))

vis_adult$season <- factor(vis_adult$season, levels = c("JFM", "AMJ", "JAS", "OND"))

adult_moci_plot <- ggplot() +
  scale_color_manual(values = c(  "#999999" , "#009999", "#666666", "#000000"))+
  scale_fill_manual(values = c( "#999999", "#009999", "#666666", "#000000"))+
  geom_col(position = "dodge2", data = vis_adult, aes(lag, y = corr_values,  color = season,
                                               fill = season))+
  labs(title = "Adult Blue Rockfish Lag Correlation -MOCI", y = "correlation",
       caption = "Fig 3")+
  scale_x_continuous(limits=c(-8.5,0.5),breaks=seq(-8,0,1))+
  scale_y_continuous(limits = c(-0.6, 0.9), breaks = seq(-0.6, 0.9, 0.2), 
                     labels = c(-0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8) )+
  theme_bw()+
  facet_grid(site ~ area)+
    theme(panel.grid=element_blank(),
          plot.caption = element_text(size = 8))
juv_moci_plot

adult_moci_plot

At the size cutoff of 20 cm, the correlation values look pretty different for Blue rockfish. I need to figure out if this size cutoff is actually telling me something important though.

Full data without protection status

There is some indication that area is more important than protection status, so I have also run these without the site aspect.

full_corr_ns <- headers


for(j in area_list){
  for(l in season_list){
      full_combos_ns <- full_moci%>%
      filter(area == j, season == l)
      cross_corr <- ccf(full_combos_ns$central_ca,
                        full_combos_ns$cpue_site, plot = F) #plot = F
      lag <- cross_corr$lag
      corr_values <- cross_corr$acf
      for(m in 1:length(lag)){
      full_corr_ns <- bind_rows(full_corr_ns, c(area = j,  season = l, 
                                     lag = lag[m], corr_values = corr_values[m]))
      }
  }
}

vis_full_ns <- full_corr_ns %>%
  filter(lag %in% c(-10:0))%>%
  mutate(corr_values = as.numeric(corr_values),
         lag = as.numeric(lag))
vis_full_ns$season <- factor(vis_full_ns$season, levels = c("JFM", "AMJ", "JAS", "OND"))

full_moci_ns <-ggplot() +
  scale_color_manual(values = c( "#999999" ,  "#009999", "#666666", "#000000"))+
  scale_fill_manual(values = c( "#999999", "#009999", "#666666", "#000000"))+
  geom_col(position = "dodge2", data = vis_full_ns, aes(lag, y = corr_values,  color = season,
                                               fill = season))+
  labs(title = "Blue Rockfish Lag Correlation -MOCI", y = "correlation",
       caption = "Fig 4")+
  scale_x_continuous(limits=c(-10.5,0.5),breaks=seq(-10,0,1))+
  scale_y_continuous(limits = c(-0.6, 0.85), breaks = seq(-0.6, 0.8, 0.2),
                     labels = c(-0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8))+
  theme_bw()+
  facet_wrap(.~ area)+
    theme(panel.grid=element_blank(),
          plot.caption = element_text(size = 8))
full_moci_ns

When I compare this figure with combined MPA/REF to the first figure, the patterns are more similar to the MPA, so the MPA may be driving the combined correlation values.

MOCI vs Component Parts

MOCI includes the following:

An interesting direction would be to run the same ccf() function with each of the component parts to see what is driving the relationship seen in MOCI vs CPUE. I am also going to investigate other fish besides Blue rockfish.

Questions for you:

  1. In the data cleaning section, I use group_by() and summarise() several times to make sure that I am accounting for cpue across multiple levels instead of jumping straight to the last step and taking the mean cpue of the entire year. Can I track variance/sd/standard error the same way? For example, asking for the variance of the level before in each of the summarise() functions (look in the last line of each summarise())?
## this is cpue at the drift level
juv <- blue_full %>%
  filter(size <= size_cutoff)%>%
  group_by(drift, trip, area, site, month, day, year, gridcell) %>%
  summarise(cpue_sum = sum(cpue))

## mean cpue at the date (month) level 
juv_2 <- juv %>%
  group_by(trip, area, site, month, year, gridcell) %>%
  summarise(cpue_date = mean(cpue_sum),
            var_date = var(cpue_sum)) ## look here

## mean cpue at the cell level
juv_3 <- juv_2 %>%
  group_by( area, site, year, gridcell) %>%
  summarise(cpue_cell = mean(cpue_date),
            var_cell = var(cpue_date)) ## look here

## mean cpue a the site (protection status level)
juv_4 <- juv_3 %>%
  group_by( area, site, year) %>%
  summarise(cpue_site = mean(cpue_cell),
            var_site = var(cpue_cell)) ## look here
  1. We only sample in one ‘season’ out of the year, so I assigned the same catch per unit effort value from an area-site-year to each MOCI season in that year so that I would be able to get correlation values for each season. Otherwise I would have to average the MOCI values across seasons to get one value per year and use that for the cross correlation. Do you have thoughts on if that is okay to do? See below for example: all AN-MPA-2007 will have a cpue value of 1.45 even though each season has a separate MOCI value.
## # A tibble: 6 x 6
## # Groups:   area, site [1]
##   area  site   year cpue_site season central_ca
##   <chr> <chr> <dbl>     <dbl> <chr>       <dbl>
## 1 AN    MPA    2007      1.45 JFM        -6.27 
## 2 AN    MPA    2007      1.45 AMJ        -5.00 
## 3 AN    MPA    2007      1.45 JAS        -0.925
## 4 AN    MPA    2007      1.45 OND        -6.01 
## 5 AN    MPA    2008      2.55 JFM        -6.18 
## 6 AN    MPA    2008      2.55 AMJ        -6.89
  1. I just want to double check that the cross correlation function is doing what I think it is doing. What I think is that it takes the cpue value for every area-site-season-year combination and compares it to the MOCI value for every area-site-season-year combination and spits out a correlation. Then it shifts the values in one direction and evaluates it again (and again etc. in both directions). In the first figure, for the Ano Nuevo (AN) MPA I see a correlation value of ~0.9 for the -2 lag in the July-August-September (JAS) season - so my interpretation would be something like ‘Blue rockfish cpue in the Ano Nuevo MPA has the highest correlation with the MOCI value two summers in the past’. See below for close up of that part of Fig 1.

Reference

Schmidt, K. T. Life History Changes in Female Blue Rockfish, Sebastes Mystinus, Before and After Overfishing, in Central California. Masters Thesis California State University, Monterey Bay (2014)